Model Selection

TinyV reward system

# TinyV reward system

Fine-tuned based on the Qwen/Qwen2.5-1.5B-Instruct model, using the TinyV reward system, which can provide more accurate reward signals in the post-training of efficient reinforcement learning (RL) and significantly improve RL efficiency and the performance of the final model.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase